Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
1.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38493339

RESUMEN

Clustering cells based on single-cell multi-modal sequencing technologies provides an unprecedented opportunity to create high-resolution cell atlas, reveal cellular critical states and study health and diseases. However, effectively integrating different sequencing data for cell clustering remains a challenging task. Motivated by the successful application of Louvain in scRNA-seq data, we propose a single-cell multi-modal Louvain clustering framework, called scMLC, to tackle this problem. scMLC builds multiplex single- and cross-modal cell-to-cell networks to capture modal-specific and consistent information between modalities and then adopts a robust multiplex community detection method to obtain the reliable cell clusters. In comparison with 15 state-of-the-art clustering methods on seven real datasets simultaneously measuring gene expression and chromatin accessibility, scMLC achieves better accuracy and stability in most datasets. Synthetic results also indicate that the cell-network-based integration strategy of multi-omics data is superior to other strategies in terms of generalization. Moreover, scMLC is flexible and can be extended to single-cell sequencing data with more than two modalities.


Asunto(s)
Cromatina , Multiómica , Análisis por Conglomerados , Algoritmos , Análisis de Secuencia de ARN
2.
Bioinformatics ; 40(1)2024 01 02.
Artículo en Inglés | MEDLINE | ID: mdl-38230824

RESUMEN

MOTIVATION: Single-cell RNA sequencing has emerged as a powerful technology for studying gene expression at the individual cell level. Clustering individual cells into distinct subpopulations is fundamental in scRNA-seq data analysis, facilitating the identification of cell types and exploration of cellular heterogeneity. Despite the recent development of many deep learning-based single-cell clustering methods, few have effectively exploited the correlations among genes, resulting in suboptimal clustering outcomes. RESULTS: Here, we propose a novel masked autoencoder-based method, scMAE, for cell clustering. scMAE perturbs gene expression and employs a masked autoencoder to reconstruct the original data, learning robust and informative cell representations. The masked autoencoder introduces a masking predictor, which captures relationships among genes by predicting whether gene expression values are masked. By integrating this masking mechanism, scMAE effectively captures latent structures and dependencies in the data, enhancing clustering performance. We conducted extensive comparative experiments using various clustering evaluation metrics on 15 scRNA-seq datasets from different sequencing platforms. Experimental results indicate that scMAE outperforms other state-of-the-art methods on these datasets. In addition, scMAE accurately identifies rare cell types, which are challenging to detect due to their low abundance. Furthermore, biological analyses confirm the biological significance of the identified cell subpopulations. AVAILABILITY AND IMPLEMENTATION: The source code of scMAE is available at: https://zenodo.org/records/10465991.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de Expresión Génica de una Sola Célula , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Análisis por Conglomerados , Algoritmos
3.
Methods ; 222: 1-9, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38128706

RESUMEN

The development of single cell RNA sequencing (scRNA-seq) has provided new perspectives to study biological problems at the single cell level. One of the key issues in scRNA-seq data analysis is to divide cells into several clusters for discovering the heterogeneity and diversity of cells. However, the existing scRNA-seq data are high-dimensional, sparse, and noisy, which challenges the existing single-cell clustering methods. In this study, we propose a joint learning framework (JLONMFSC) for clustering scRNA-seq data. In our method, the dimension of the original data is reduced to minimize the effect of noise. In addition, the graph regularized matrix factorization is used to learn the local features. Further, the Low-Rank Representation (LRR) subspace clustering is utilized to learn the global features. Finally, the joint learning of local features and global features is performed to obtain the results of clustering. We compare the proposed algorithm with eight state-of-the-art algorithms for clustering performance on six datasets, and the experimental results demonstrate that the JLONMFSC achieves better performance in all datasets. The code is avalable at https://github.com/lanbiolab/JLONMFSC.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de Expresión Génica de una Sola Célula , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Algoritmos , Análisis por Conglomerados
4.
Brief Bioinform ; 25(1)2023 11 22.
Artículo en Inglés | MEDLINE | ID: mdl-38145950

RESUMEN

Single cell sequencing technology has provided unprecedented opportunities for comprehensively deciphering cell heterogeneity. Nevertheless, the high dimensionality and intricate nature of cell heterogeneity have presented substantial challenges to computational methods. Numerous novel clustering methods have been proposed to address this issue. However, none of these methods achieve the consistently better performance under different biological scenarios. In this study, we developed CAKE, a novel and scalable self-supervised clustering method, which consists of a contrastive learning model with a mixture neighborhood augmentation for cell representation learning, and a self-Knowledge Distiller model for the refinement of clustering results. These designs provide more condensed and cluster-friendly cell representations and improve the clustering performance in term of accuracy and robustness. Furthermore, in addition to accurately identifying the major type cells, CAKE could also find more biologically meaningful cell subgroups and rare cell types. The comprehensive experiments on real single-cell RNA sequencing datasets demonstrated the superiority of CAKE in visualization and clustering over other comparison methods, and indicated its extensive application in the field of cell heterogeneity analysis. Contact: Ruiqing Zheng. (rqzheng@csu.edu.cn).


Asunto(s)
Algoritmos , Aprendizaje , Análisis por Conglomerados , Análisis de Secuencia de ARN
5.
Methods ; 220: 90-97, 2023 12.
Artículo en Inglés | MEDLINE | ID: mdl-37952704

RESUMEN

For a given single cell RNA-seq data, it is critical to pinpoint key cellular stages and quantify cells' differentiation potency along a differentiation pathway in a time course manner. Currently, several methods based on the entropy of gene functions or PPI network have been proposed to solve the problem. Nevertheless, these methods still suffer from the inaccurate interactions and noises originating from scRNA-seq profile. In this study, we proposed a cell potency inference method based on cell-specific network entropy, called SPIDE. SPIDE introduces the local weighted cell-specific network for each cell to maintain cell heterogeneity and calculates the entropy by incorporating gene expression with network structure. In this study, we compared three cell entropy estimation models on eight scRNA-Seq datasets. The results show that SPIDE obtains consistent conclusions with real cell differentiation potency on most datasets. Moreover, SPIDE accurately recovers the continuous changes of potency during cell differentiation and significantly correlates with the stemness of tumor cells in Colorectal cancer. To conclude, our study provides a universal and accurate framework for cell entropy estimation, which deepens our understanding of cell differentiation, the development of diseases and other related biological research.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de la Célula Individual , Entropía , Diferenciación Celular/genética , Análisis de la Célula Individual/métodos , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos
6.
Brief Bioinform ; 24(5)2023 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-37715282

RESUMEN

Gene regulatory network plays a crucial role in controlling the biological processes of living creatures. Deciphering the complex gene regulatory networks from experimental data remains a major challenge in system biology. Recent advances in single-cell RNA sequencing technology bring massive high-resolution data, enabling computational inference of cell-specific gene regulatory networks (GRNs). Many relevant algorithms have been developed to achieve this goal in the past years. However, GRN inference is still less ideal due to the extra noises involved in pseudo-time information and large amounts of dropouts in datasets. Here, we present a novel GRN inference method named Normi, which is based on non-redundant mutual information. Normi manipulates these problems by employing a sliding size-fixed window approach on the entire trajectory and conducts average smoothing strategy on the gene expression of the cells in each window to obtain representative cells. To further alleviate the impact of dropouts, we utilize the mixed KSG estimator to quantify the high-order time-delayed mutual information among genes, then filter out the redundant edges by adopting Max-Relevance and Min Redundancy algorithm. Moreover, we determined the optimal time delay for each gene pair by distance correlation. Normi outperforms other state-of-the-art GRN inference methods on both simulated data and single-cell RNA sequencing (scRNA-seq) datasets, demonstrating its superiority in robustness. The performance of Normi in real scRNA-seq data further reveals its ability to identify the key regulators and crucial biological processes.


Asunto(s)
Algoritmos , Redes Reguladoras de Genes
7.
Bioinformatics ; 39(8)2023 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-37584660

RESUMEN

MOTIVATION: scATAC-seq has enabled chromatin accessibility landscape profiling at the single-cell level, providing opportunities for determining cell-type-specific regulation codes. However, high dimension, extreme sparsity, and large scale of scATAC-seq data have posed great challenges to cell-type identification. Thus, there has been a growing interest in leveraging the well-annotated scRNA-seq data to help annotate scATAC-seq data. However, substantial computational obstacles remain to transfer information from scRNA-seq to scATAC-seq, especially for their heterogeneous features. RESULTS: We propose a new transfer learning method, scNCL, which utilizes prior knowledge and contrastive learning to tackle the problem of heterogeneous features. Briefly, scNCL transforms scATAC-seq features into gene activity matrix based on prior knowledge. Since feature transformation can cause information loss, scNCL introduces neighborhood contrastive learning to preserve the neighborhood structure of scATAC-seq cells in raw feature space. To learn transferable latent features, scNCL uses a feature projection loss and an alignment loss to harmonize embeddings between scRNA-seq and scATAC-seq. Experiments on various datasets demonstrated that scNCL not only realizes accurate and robust label transfer for common types, but also achieves reliable detection of novel types. scNCL is also computationally efficient and scalable to million-scale datasets. Moreover, we prove scNCL can help refine cell-type annotations in existing scATAC-seq atlases. AVAILABILITY AND IMPLEMENTATION: The source code and data used in this paper can be found in https://github.com/CSUBioGroup/scNCL-release.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de Expresión Génica de una Sola Célula , Perfilación de la Expresión Génica/métodos , Análisis de la Célula Individual/métodos , Programas Informáticos , Cromatina , Análisis de Secuencia de ARN/métodos
8.
Methods ; 216: 21-38, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37315825

RESUMEN

Single-cell RNA-sequencing (scRNA-seq) data suffer from a lot of zeros. Such dropout events impede the downstream data analyses. We propose BayesImpute to infer and impute dropouts from the scRNA-seq data. Using the expression rate and coefficient of variation of the genes within the cell subpopulation, BayesImpute first determines likely dropouts, and then constructs the posterior distribution for each gene and uses the posterior mean to impute dropout values. Some simulated and real experiments show that BayesImpute can effectively identify dropout events and reduce the introduction of false positive signals. Additionally, BayesImpute successfully recovers the true expression levels of missing values, restores the gene-to-gene and cell-to-cell correlation coefficient, and maintains the biological information in bulk RNA-seq data. Furthermore, BayesImpute boosts the clustering and visualization of cell subpopulations and improves the identification of differentially expressed genes. We further demonstrate that, in comparison to other statistical-based imputation methods, BayesImpute is scalable and fast with minimal memory usage.


Asunto(s)
Análisis de Expresión Génica de una Sola Célula , Programas Informáticos , Análisis de Secuencia de ARN/métodos , Teorema de Bayes , Análisis de la Célula Individual/métodos , Probabilidad , Perfilación de la Expresión Génica
9.
Bioinformatics ; 39(39 Suppl 1): i368-i376, 2023 06 30.
Artículo en Inglés | MEDLINE | ID: mdl-37387178

RESUMEN

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) offers a powerful tool to dissect the complexity of biological tissues through cell sub-population identification in combination with clustering approaches. Feature selection is a critical step for improving the accuracy and interpretability of single-cell clustering. Existing feature selection methods underutilize the discriminatory potential of genes across distinct cell types. We hypothesize that incorporating such information could further boost the performance of single cell clustering. RESULTS: We develop CellBRF, a feature selection method that considers genes' relevance to cell types for single-cell clustering. The key idea is to identify genes that are most important for discriminating cell types through random forests guided by predicted cell labels. Moreover, it proposes a class balancing strategy to mitigate the impact of unbalanced cell type distributions on feature importance evaluation. We benchmark CellBRF on 33 scRNA-seq datasets representing diverse biological scenarios and demonstrate that it substantially outperforms state-of-the-art feature selection methods in terms of clustering accuracy and cell neighborhood consistency. Furthermore, we demonstrate the outstanding performance of our selected features through three case studies on cell differentiation stage identification, non-malignant cell subtype identification, and rare cell identification. CellBRF provides a new and effective tool to boost single-cell clustering accuracy. AVAILABILITY AND IMPLEMENTATION: All source codes of CellBRF are freely available at https://github.com/xuyp-csu/CellBRF.


Asunto(s)
Benchmarking , Bosques Aleatorios , Diferenciación Celular , Análisis por Conglomerados
10.
IEEE/ACM Trans Comput Biol Bioinform ; 20(3): 2007-2015, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37015596

RESUMEN

Advances in single-cell RNA sequencing (scRNA-seq) technologies allow researchers to analyze the genome-wide transcription profile and to solve biological problems at the individual-cell resolution. However, existing clustering methods on scRNA-seq suffer from high dropout rate and curse of dimensionality in the data. Here, we propose a novel pipeline, scBKAP, the cornerstone of which is a single-cell bisecting K-means clustering method based on an autoencoder network and a dimensionality reduction model MPDR. Specially, scBKAP utilizes an autoencoder network to reconstruct gene expression values from scRNA-seq data to alleviate the dropout issue, and the MPDR model composed of the M3Drop feature selection algorithm and the PHATE dimensionality reduction algorithm to reduce the dimensions of reconstructed data. The dimensionality-reduced data are then fed into the bisecting K-means clustering algorithm to identify the clusters of cells. Comprehensive experiments demonstrate scBKAP's superior performance over nine state-of-the-art single-cell clustering methods on 21 public scRNA-seq datasets and simulated datasets. The source codes and datasets are available at https://github.com/YuBinLab-QUST/scBKAP/ and https://doi.org/10.24433/CO.4592131.v1.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de Expresión Génica de una Sola Célula , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Algoritmos , Análisis por Conglomerados
11.
Bioinformatics ; 39(3)2023 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-36821425

RESUMEN

MOTIVATION: Integration of growing single-cell RNA sequencing datasets helps better understand cellular identity and function. The major challenge for integration is removing batch effects while preserving biological heterogeneities. Advances in contrastive learning have inspired several contrastive learning-based batch correction methods. However, existing contrastive-learning-based methods exhibit noticeable ad hoc trade-off between batch mixing and preservation of cellular heterogeneities (mix-heterogeneity trade-off). Therefore, a deliberate mix-heterogeneity trade-off is expected to yield considerable improvements in scRNA-seq dataset integration. RESULTS: We develop a novel contrastive learning-based batch correction framework, CIAIRE, which achieves superior mix-heterogeneity trade-off. The key contributions of CLAIRE are proposal of two complementary strategies: construction strategy and refinement strategy, to improve the appropriateness of positive pairs. Construction strategy dynamically generates positive pairs by augmenting inter-batch mutual nearest neighbors (MNN) with intra-batch k-nearest neighbors (KNN), which improves the coverage of positive pairs for the whole distribution of shared cell types between batches. Refinement strategy aims to automatically reduce the potential false positive pairs from the construction strategy, which resorts to the memory effect of deep neural networks. We demonstrate that CLAIRE possesses superior mix-heterogeneity trade-off over existing contrastive learning-based methods. Benchmark results on six real datasets also show that CLAIRE achieves the best integration performance against eight state-of-the-art methods. Finally, comprehensive experiments are conducted to validate the effectiveness of CLAIRE. AVAILABILITY AND IMPLEMENTATION: The source code and data used in this study can be found in https://github.com/CSUBioGroup/CLAIRE-release. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Análisis de la Célula Individual , Programas Informáticos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Redes Neurales de la Computación , Análisis por Conglomerados
12.
Surg Radiol Anat ; 45(3): 241-246, 2023 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-36715709

RESUMEN

OBJECTIVE: The purpose of this study was to research the morphological classification and clinical significance of vertebral artery sulcus on atlas based on CT three-dimensional reconstruction. METHODS: Three-dimensional reconstruction images of 300 adult atlases were collected. A total of 600 atlas vertebral artery sulci were selected in this study. The parameters required for placement of C1 pedicle screw, including depth of grinding drilling (ao), width (cd), length ab), height (H), lateral wall thickness (L1), inner wall thickness (L2), medial angle (∠α), and the cephalad angle to the transverse plane of atlas pedicle (∠ß), were measured. RESULTS: CT three-dimensional reconstruction images showed that there were five types of atlas vertebral artery sulci: no process type (n = 494 cases, 82.33%), upper process type (n = 29, 4.83%), lower process type (n = 25, 4.17%), double process type (n = 19, 3.17%), and posterior ring type (33, 5.50%). One-way ANOVA tests showed that the five groups differed significantly in the parameter of ao, L2, H, ∠α and ∠ß. One-way ANOVA with the LSD post hoc tests showed that the parameter ao of the group of no process type was less than that of the group of upper or lower process type (P < 0.05), and ao of the group of lower process or posterior ring type was less than that of the group of the upper type (P < 0.05). The parameter of ao of the male group was larger than that of the female group. CONCLUSION: No process type of the atlas vertebral artery sulcus was the most common, and the medial angle and cephalad angle of the atlas pedicle in this type were the smallest. When pedicle screws are inserted, the above two angles should not be too large. Male's ao was larger than that of female's. All these findings should be considered to avoid the deviation of the nail track.


Asunto(s)
Tornillos Pediculares , Arteria Vertebral , Adulto , Humanos , Masculino , Femenino , Arteria Vertebral/diagnóstico por imagen , Imagenología Tridimensional , Relevancia Clínica , Tomografía Computarizada por Rayos X
13.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36567258

RESUMEN

Single-cell RNA-sequencing technology (scRNA-seq) brings research to single-cell resolution. However, a major drawback of scRNA-seq is large sparsity, i.e. expressed genes with no reads due to technical noise or limited sequence depth during the scRNA-seq protocol. This phenomenon is also called 'dropout' events, which likely affect downstream analyses such as differential expression analysis, the clustering and visualization of cell subpopulations, cellular trajectory inference, etc. Therefore, there is a need to develop a method to identify and impute these dropout events. We propose Bubble, which first identifies dropout events from all zeros based on expression rate and coefficient of variation of genes within cell subpopulation, and then leverages an autoencoder constrained by bulk RNA-seq data to only impute those values. Unlike other deep learning-based imputation methods, Bubble fuses the matched bulk RNA-seq data as a constraint to reduce the introduction of false positive signals. Using simulated and several real scRNA-seq datasets, we demonstrate that Bubble enhances the recovery of missing values, gene-to-gene and cell-to-cell correlations, and reduces the introduction of false positive signals. Regarding some crucial downstream analyses of scRNA-seq data, Bubble facilitates the identification of differentially expressed genes, improves the performance of clustering and visualization, and aids the construction of cellular trajectory. More importantly, Bubble provides fast and scalable imputation with minimal memory usage.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de Expresión Génica de una Sola Célula , RNA-Seq , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Programas Informáticos
14.
Brief Bioinform ; 25(1)2023 11 22.
Artículo en Inglés | MEDLINE | ID: mdl-38189544

RESUMEN

With the development of spatially resolved transcriptomics technologies, it is now possible to explore the gene expression profiles of single cells while preserving their spatial context. Spatial clustering plays a key role in spatial transcriptome data analysis. In the past 2 years, several graph neural network-based methods have emerged, which significantly improved the accuracy of spatial clustering. However, accurately identifying the boundaries of spatial domains remains a challenging task. In this article, we propose stAA, an adversarial variational graph autoencoder, to identify spatial domain. stAA generates cell embedding by leveraging gene expression and spatial information using graph neural networks and enforces the distribution of cell embeddings to a prior distribution through Wasserstein distance. The adversarial training process can make cell embeddings better capture spatial domain information and more robust. Moreover, stAA incorporates global graph information into cell embeddings using labels generated by pre-clustering. Our experimental results show that stAA outperforms the state-of-the-art methods and achieves better clustering results across different profiling platforms and various resolutions. We also conducted numerous biological analyses and found that stAA can identify fine-grained structures in tissues, recognize different functional subtypes within tumors and accurately identify developmental trajectories.


Asunto(s)
Perfilación de la Expresión Génica , Transcriptoma , Análisis por Conglomerados , Redes Neurales de la Computación
15.
Front Pharmacol ; 13: 898519, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36105222

RESUMEN

Background: Accurate target identification of small molecules and downstream target annotation are important in pharmaceutical research and drug development. Methods: We present TAIGET, a friendly and easy to operate graphical web interface, which consists of a docking module based on AutoDock Vina and LeDock, a target screen module based on a Bayesian-Gaussian mixture model (BGMM), and a target annotation module derived from >14,000 cancer-related literature works. Results: TAIGET produces binding poses by selecting ≤5 proteins at a time from the UniProt ID-PDB network and submitting ≤3 ligands at a time with the SMILES format. Once the identification process of binding poses is complete, TAIGET then screens potential targets based on the BGMM. In addition, three medical experts and 10 medical students curated associations among drugs, genes, gene regulation, cancer outcome phenotype, 2,170 cancer cell types, and 73 cancer types from the PubMed literature, with the aim to construct a target annotation module. A target-related PPI network can be visualized by an interactive interface. Conclusion: This online tool significantly lowers the entry barrier of virtual identification of targets for users who are not experts in the technical aspects of virtual drug discovery. The web server is available free of charge at http://www.taiget.cn/.

16.
Brief Bioinform ; 23(5)2022 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-35901449

RESUMEN

Integration of single-cell transcriptome datasets from multiple sources plays an important role in investigating complex biological systems. The key to integration of transcriptome datasets is batch effect removal. Recent methods attempt to apply a contrastive learning strategy to correct batch effects. Despite their encouraging performance, the optimal contrastive learning framework for batch effect removal is still under exploration. We develop an improved contrastive learning-based batch correction framework, GLOBE. GLOBE defines adaptive translation transformations for each cell to guarantee the stability of approximating batch effects. To enhance the consistency of representations alignment, GLOBE utilizes a loss function that is both hardness-aware and consistency-aware to learn batch effect-invariant representations. Moreover, GLOBE computes batch-corrected gene matrix in a transparent approach to support diverse downstream analysis. Benchmarking results on a wide spectrum of datasets show that GLOBE outperforms other state-of-the-art methods in terms of robust batch mixing and superior conservation of biological signals. We further apply GLOBE to integrate two developing mouse neocortex datasets and show GLOBE succeeds in removing batch effects while preserving the contiguous structure of cells in raw data. Finally, a comprehensive study is conducted to validate the effectiveness of GLOBE.


Asunto(s)
Benchmarking , Transcriptoma , Animales , Ratones
17.
Methods ; 205: 114-122, 2022 09.
Artículo en Inglés | MEDLINE | ID: mdl-35777719

RESUMEN

The rapid development of single-cell sequencing technologies makes it possible to analyze cellular heterogeneity at the single-cell level. Cell clustering is one of the most fundamental and common steps in the heterogeneity analysis. However, due to the high noise level, high dimensionality and high sparsity, accurate cell clustering is still challengeable. Here, we present DeepCI, a new clustering approach for scRNA-seq data. Using two autoencoders to obtain cell embedding and gene embedding, DeepCI can simultaneously learn cell low-dimensional representation and clustering. In addition, the recovered gene expression matrix can be obtained by the matrix multiplication of cell and gene embedding. To evaluate the performance of DeepCI, we performed it on several real scRNA-seq datasets for clustering and visualization analysis. The experimental results show that DeepCI obtains the overall better performance than several popular single cell analysis methods. We also evaluated the imputation performance of DeepCI by a dedicated experiment. The corresponding results show that the imputed gene expression of known specific marker genes can greatly improve the accuracy of cell type classification.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de la Célula Individual , Análisis por Conglomerados , RNA-Seq , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos
18.
IEEE/ACM Trans Comput Biol Bioinform ; 19(2): 1042-1049, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-33035155

RESUMEN

Gene regulatory networks (GRNs)are involved in various biological processes, such as cell cycle, differentiation and apoptosis. The existing large amount of expression data, especially the time-series expression data, provide a chance to infer GRNs by computational methods. These data can reveal the dynamics of gene expression and imply the regulatory relationships among genes. However, identify the indirect regulatory links is still a big challenge as most studies treat time points as independent observations, while ignoring the influences of time delays. In this study, we propose a GRN inference method based on information-theory measure, called NIMCE. NIMCE incorporates the transfer entropy to measure the regulatory links between each pair of genes, then applies the causation entropy to filter indirect relationships. In addition, NIMCE applies multi time delays to identify indirect regulatory relationships from candidate genes. Experiments on simulated and colorectal cancer data show NIMCE outperforms than other competing methods. All data and codes used in this study are publicly available at https://github.com/CSUBioGroup/NIMCE.


Asunto(s)
Redes Reguladoras de Genes , Causalidad , Entropía , Redes Reguladoras de Genes/genética , Factores de Tiempo
19.
IEEE/ACM Trans Comput Biol Bioinform ; 19(3): 1592-1602, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-33417563

RESUMEN

Biological functions of a cell are typically carried out through protein complexes. The detection of protein complexes is therefore of great significance for understanding the cellular organizations and protein functions. In the past decades, many computational methods have been proposed to detect protein complexes. However, most of the existing methods just search the local topological information to mine dense subgraphs as protein complexes, ignoring the global topological information. To tackle this issue, we propose the DPCMNE method to detect protein complexes via multi-level network embedding. It can preserve both the local and global topological information of biological networks. First, DPCMNE employs a hierarchical compressing strategy to recursively compress the input protein-protein interaction (PPI) network into multi-level smaller PPI networks. Then, a network embedding method is applied on these smaller PPI networks to learn protein embeddings of different levels of granularity. The embeddings learned from all the compressed PPI networks are concatenated to represent the final protein embeddings of the original input PPI network. Finally, a core-attachment based strategy is adopted to detect protein complexes in the weighted PPI network constructed by the pairwise similarity of protein embeddings. To assess the efficiency of our proposed method, DPCMNE is compared with other eight clustering algorithms on two yeast datasets. The experimental results show that the performance of DPCMNE outperforms those state-of-the-art complex detection methods in terms of F1 and F1+Acc. Furthermore, the results of functional enrichment analysis indicate that protein complexes detected by DPCMNE are more biologically significant in terms of P-score.


Asunto(s)
Mapeo de Interacción de Proteínas , Mapas de Interacción de Proteínas , Algoritmos , Biología Computacional/métodos , Mapeo de Interacción de Proteínas/métodos , Proteínas/metabolismo , Saccharomyces cerevisiae/metabolismo
20.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34864877

RESUMEN

Increasing evidences have proved that circRNA plays a significant role in the development of many diseases. In addition, many researches have shown that circRNA can be considered as the potential biomarker for clinical diagnosis and treatment of disease. Some computational methods have been proposed to predict circRNA-disease associations. However, the performance of these methods is limited as the sparsity of low-order interaction information. In this paper, we propose a new computational method (KGANCDA) to predict circRNA-disease associations based on knowledge graph attention network. The circRNA-disease knowledge graphs are constructed by collecting multiple relationship data among circRNA, disease, miRNA and lncRNA. Then, the knowledge graph attention network is designed to obtain embeddings of each entity by distinguishing the importance of information from neighbors. Besides the low-order neighbor information, it can also capture high-order neighbor information from multisource associations, which alleviates the problem of data sparsity. Finally, the multilayer perceptron is applied to predict the affinity score of circRNA-disease associations based on the embeddings of circRNA and disease. The experiment results show that KGANCDA outperforms than other state-of-the-art methods in 5-fold cross validation. Furthermore, the case study demonstrates that KGANCDA is an effective tool to predict potential circRNA-disease associations.


Asunto(s)
MicroARNs , ARN Circular , Biología Computacional/métodos , MicroARNs/genética , Redes Neurales de la Computación , Reconocimiento de Normas Patrones Automatizadas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...